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ABSTRACT 


Two  forces  engage  in  a  duel,  with  each  force  initially  consisting  of  several  heterogeneous 
units.  Each  unit  can  be  assigned  to  fire  at  any  opposing  unit,  but  the  kill  rate  depends  on 
the  assignment.  As  the  duel  proceeds,  each  force — knowing  which  units  are  still  alive  in 
real  time — decides  dynamically  how  to  assign  its  fire,  in  order  to  maximize  the 
probability  of  wiping  out  the  opposing  force  before  getting  wiped  out.  It  has  been  shown 
in  the  literature  that  an  optimal  pure  strategy  exists  for  this  two-person  zero-sum  game, 
but  computing  the  optimal  strategy  remained  cumbersome  because  of  the  game’s  huge 
payoff  matrix.  This  paper  gives  an  efficient  algorithm  to  compute  the  optimal  strategy 
without  enumerating  the  entire  payoff  matrix,  and  offers  some  insights  into  the  special 
case,  when  one  force  has  only  one  unit. 
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1  Introduction 


We  consider  a  stochastic  duel  model  with  each  force  consisting  of  heterogeneous  units.  Sup¬ 
pose,  at  the  beginning,  force  A  has  m  units  and  force  B  has  n  units.  If  A’s  unit  i  fires  at 
B's  unit  j,  then  the  time  to  kill  follows  an  exponential  distribution  with  rate  \j.  If  B's 
unit  j  fires  at  A’s  unit  i,  then  the  time  to  kill  follows  an  exponential  distribution  with  rate 
Oji .  If  multiple  units  fire  at  the  same  target,  then  the  time  to  kill  follows  an  exponential 
distribution,  with  the  rate  equal  to  the  sum  of  individual  kill  rates.  Each  force  keeps  perfect 
knowledge  when  a  unit  gets  killed  and  decides  dynamically  how  to  assign  its  remaining  units 
to  fire  at  the  opposing  force’s  remaining  units.  The  goal  of  each  force  is  to  maximize  the 
probability  of  wiping  out  the  opposing  force  before  getting  wiped  out. 

This  stochastic  duel  model  was  first  studied  by  Kikuta  (1986),  and  it  was  shown  that  a 
pure  optimal  strategy  exists,  ft  is,  however,  rather  cumbersome  to  determine  the  optimal 
strategy,  because  one  needs  to  enumerate  a  huge  payoff  matrix.  Our  main  contribution 
in  this  paper  is  to  establish  a  necessary  and  sufficient  condition  for  a  pure  strategy  to  be 
optimal,  and  use  the  condition  to  facilitate  an  efficient  algorithm  to  compute  an  optimal 
strategy.  We  also  provide  some  insights  into  the  special  case,  when  one  force  has  only  one 
unit. 

Two  special  cases  of  the  model  have  been  reported  in  the  literature.  If  each  force  has 
homogeneous  units,  such  that  \l3  =  A  and  Qji  =  9  for  all  i,  j,  then  any  policy  that  keeps  all 
units  busy  firing  at  any  opposing  unit  is  optimal.  Let  V (m,  n )  denote  A’s  win  probability  if 
A  has  m  units  and  B  has  n  units,  for  rn, n  =  1,2,....  A  recursive  equation  can  be  derived 
by  conditioning  on  whose  unit  is  killed  next,  and  is  given  by 


V  ( m ,  n)  = 


mX 


™  \  I  a 


V{ 


m,  n 


If  + 


n9 


-T/f) 


with  the  boundary  conditions  V(m,  0)  =  1  for  m  >  1,  and  V(0,  n)  =  0  for  n  >  1.  Letting 
r  =  X/6 ,  Brown  (1963)  showed  that 


V  (m,  n) 


n  ^  (-l)m-fc  km+n  Y(rk  +  1) 
(m  —  k)\  k\  T(n  +  rk  +  1)  ’ 


When  the  units  are  heterogeneous,  it  makes  a  difference  how  each  force  allocates  his  fire.  In 
addition,  the  fire  allocation  may  change  as  both  forces  lose  their  units  during  the  duel. 

Another  special  case,  when  m  =  1,  was  previously  studied  by  Friedman  (1977)  and  Kikuta 
(1983),  where  A  needs  to  determine  which  fire  order,  among  the  n\  possible  fire  orders,  is 
optimal.  In  many  sequencing  problems,  when  one  decides  in  which  order  to  process  a  number 
of  jobs,  it  is  possible  to  compute  an  index  for  each  job  based  on  its  own  attributes,  and 
to  obtain  the  optimal  sequence  by  sorting  those  indices  (Ross,  1983;  Gittins  et  ah,  2011). 
Unfortunately,  in  this  problem  the  preference  between  two  targets  depends  on  the  other 
targets  still  alive,  which  makes  the  problem  difficult.  Friedman  (1977)  gave  a  necessary 
condition  for  the  optimal  order,  while  Kikuta  (1983)  strengthened  the  necessary  condition 
and  gave  a  sufficient  condition  for  optimality.  In  general,  however,  to  find  the  optimal  fire 
order  one  needs  to  compare  all  n\  fire  orders  by  brute  force. 
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The  study  of  duel  models  dates  back  to  the  1910s,  when  Lanchester  (1916)  proposed 
differential  equations  that  govern  the  strength  of  each  force  through  time,  which  gave  rise  to 
what  later  became  known  as  Lanchester  models.  A  stream  of  works  extended  the  Lanchester 
models — which  are  deterministic  in  nature — to  stochastic  duel  models  by  introducing  ran¬ 
domness  to  shot  outcomes,  time  between  taking  shots,  etc.;  see,  for  instance,  Brown  (1963); 
Williams  and  Ancker  (1963);  Barfoot  (1974);  Kress  (1992);  and  Kress  and  Talmor  (1999). 
These  stochastic  duel  models,  however,  assume  homogeneous  units,  so  there  is  no  decision 
making.  The  focus  of  earlier  works  was  to  obtain  expressions  for  win  probability  in  various 
duel  scenarios.  Readers  interested  in  comprehensive  surveys  on  combat  models  are  referred 
to  Ancker  (2006);  Washburn  and  Kress  (2009);  and  Kress  (2012). 

The  rest  of  this  paper  proceeds  as  follows.  Section  2  presents  the  main  results,  where 
we  give  a  necessary  and  sufficient  condition  for  a  pure  strategy  to  be  optimal,  and  then  use 
the  condition  to  facilitate  an  efficient  algorithm  to  compute  the  optimal  strategy.  Section  3 
discusses  the  special  case  when  m  —  1,  and  gives  a  condition  under  which  the  preference 
between  two  targets  can  be  readily  determined,  regardless  of  the  other  targets  still  alive. 

2  Main  Results 

At  the  beginning,  force  A  (or  player  A)  has  a  set  of  units  Sa  =  {1,2 and  force 
B  (or  player  B )  has  a  set  of  units  Sb  =  {1,2, . . . ,  n}.  As  the  duel  proceeds,  each  player 
keeps  real-time  knowledge  about  when  a  unit  gets  killed.  In  other  words,  each  player  has  full 
information  about  the  history  of  the  game  and,  at  any  time  point,  decides  how  to  allocate  his 
fire  on  the  opponent’s  remaining  units.  Because  we  assume  exponential  kill  rates,  knowing 
which  units  are  still  alive  on  both  sides,  the  future  of  the  game  becomes  independent  from 
its  past. 

At  any  time  point,  the  state  of  the  duel  can  be  delineated  by  ( S'A,  S'B ),  with  SA  C  Sa 
being  the  set  of  A’s  remaining  units,  and  S'B  C  Sb  the  set  of  B' s  remaining  units.  The  game 
belongs  to  the  class  of  Markov  games,  because  once  in  a  state,  the  previous  actions  and 
results  become  irrelevant  to  the  future  of  the  game.  It  is  also  an  exhaustive  game  according 
to  the  definition  in  Washburn  (2003),  because  each  state  will  be  visited  once  at  most.  For 
a  given  state  ( S'A ,  S'B ),  the  two  players  can  be  viewed  as  playing  a  single-stage  game,  which 
ends  as  soon  as  any  unit  on  either  side  is  killed.  In  other  words,  by  letting  V (SA,  SB)  denote 
A’s  win  probability  in  state  (SA,SB),  then  the  payoff  to  A  is  V(SA  \  {i},^)  if  A’s  unit  i 
is  killed  next,  and  is  V(SA,S'B  \  { j })  if  B’s  unit  j  is  killed  next.  In  addition,  because  the 
player  loses  the  game  if  he  loses  all  his  units,  we  have  that  V(SA,$)  =  1,  if  SA  ^  0,  and 
K(0,Ag)  =  0,  if  S'B  ^  0.  Consequently,  if  we  can  solve  this  single-stage  game,  then  we  can 
compute  the  optimal  strategy  recursively  on  \SA  \  +  S'B | ,  beginning  from  1,2,...,  and  so  on. 

The  rest  of  this  section  focuses  on  the  single-stage  game.  Section  2.1  recounts  how  to 
construct  a  single-stage  game  in  matrix  form,  as  was  done  in  Kikuta  (1986).  Section  2.2  gives 
a  necessary  and  sufficient  condition  for  a  saddle  point  in  this  matrix,  and  Section  2.3  gives 
an  efficient  algorithm  to  find  a  saddle  point  without  enumerating  the  entire  payoff  matrix. 
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2.1  Single-Stage  Game  in  Matrix  Form 

Consider  the  beginning  of  the  game  when  the  state  is  (Sa,  Sb).  For  notational  convenience, 
write  at  =  V(Sa  \  {0)  Ab)  for  all  i  G  Sa,  and  bj  =  V(Sa,Sb  \  {j })  for  all  j  G  Sb-  That  is, 
a,-  is  A’s  win  probability  if  he  loses  unit  i  in  state  (Sa,  Sb),  and  bj  is  A’s  win  probability  if 
he  kills  B' s  unit  j  in  state  (Sa,  Sb)- 

A  pure  strategy  in  state  (Sa,  Sb)  is  a  fire  allocation.  For  i  G  Sa,  j  G  Sb,  let 


xij  — 


Uji  — 


1,  if  A’s  unit  i  fires  at  ZTs  unit  j, 
0,  otherwise. 

1,  if  B'  s  unit  j  hres  at  A’s  unit  i, 
0,  otherwise. 


The  set  of  A’s  pure  strategies  is 


n4  =  <  x  = 


\x. 


1J1 


X 


G  {0, 1},  i  G  Sa ,  j  G  Sb]  and  x^  =  1,  for  all  i  G  Sa 


(1) 


Because  each  of  A’s  m  units  can  fire  at  any  of  B' s  n  units,  the  number  of  A’s  pure  strategy 
is  |rU|  =  nm.  Similarly,  the  set  of  B's  pure  strategies  is 


Fs  =  <  y  =  [ Uji \  ■  Uji  G  {0,  l},i  G  SA,j  G  SB]  and 


yji 

iESa 


=  1,  for  all  j  G  Sj 


B 


(2) 


with  |nB|  =  mn. 

Given  A’s  pure  strategy  x,  let 


a j(x)  —  y  ^  \ij  (3) 

i£SA 

denote  the  rate  at  which  B's  unit  j  gets  killed.  In  other  words,  the  amount  of  time  it  takes 
for  A  to  kill  ZTs  unit  j  follows  an  exponential  distribution  with  rate  A j(x),  if  A  uses  pure 
strategy  x.  Similarly,  if  B  uses  pure  strategy  y,  let 


®i(lj)  y  ^ 

j&Ss 


denote  the  rate  at  which  A’s  unit  i  gets  killed. 

If  A  chooses  a  pure  strategy  x  G  II4,  and  B  chooses  a  pure  strategy  y  G  IIB,  then  by 
conditioning  on  which  unit  gets  killed  next,  the  probability  that  A  will  eventually  win  the 
duel  is  given  by 


^2jeSB  -A i(x)bj  +  Y2iesA  ®i(y)ai 

^jcsB  A/  (A)  +  XAesu  ®i(y) 


(4) 


which  is  also  the  payoff  to  A  for  the  pure  strategy  pair  (x,  y).  The  payoff  to  B  is  1  —  f(x,  y ), 
or  equivalently,  —f(x,y). 
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In  this  two-person  zero-sum  game  in  standard  matrix  form,  A  has  nm  pure  strategies  and 
B  has  mn  pure  strategies.  Kikuta  (1986)  showed  that  this  matrix  game  has  a  saddle  point. 
To  determine  the  saddle  point,  however,  one  needed  to  enumerate  the  entire  payoff  matrix 
of  size  nm  by  mn. 

Remark  1  The  two-person  zero-sum  game  discussed  in  this  section  can  be  regarded  as  a 
special  case  of  a  race-to-reward  game  as  follows.  Two  players,  A  and  B,  each  have  resources 
to  allocate  among  tasks.  A  has  a  set  of  resources,  Sa,  to  allocate  among  a  set  of  tasks,  Ta, 
with  allocation  of  resource  i  to  task  k  leading  to  a  task  completion  rate  A j*.,  for  i  e  Sa  and 
k  G  Ta-  Similarly,  B  has  a  set  of  resources,  Sb,  to  allocate  among  a  set  of  tasks,  Tb,  with 
allocation  of  resource  j  to  task  l  leading  to  a  task  completion  rate  9ji ,  for  j  G  Sb  and  l  G  TB. 
Each  task  has  an  associated  reward  to  A,  namely  ay.  for  k  G  TA,  and  by  for  l  G  Tb,  with 
cik  >  bi  for  all  k  G  Ta  and  l  G  TB  to  avoid  triviality.  The  payoff  to  A  is  the  reward  of  the 
task  that  is  completed  first.  The  game  is  zero-sum,  with  A  trying  to  maximize  his  expected 
payoff  and  B  trying  to  minimize  it.  If  Ta  —  Sb  and  TB  =  Sa,  then  this  race-to-reward  game 
reduces  to  the  single-stage  duel  game  described  in  this  section.  Although  we  present  our 
analysis  in  the  context  of  a  single-stage  duel  game,  all  the  results  can  be  straightforwardly 
extended  to  the  race-to-reward  game. 

2.2  Necessary  and  Sufficient  Condition  for  Saddle  Points 

Theorem  1  gives  an  alternative  proof  that  the  matrix  game  in  Section  2.1  has  a  saddle  point. 
The  proof  also  shows  how  to  determine  the  optimal  strategy  if  one  knows  the  value  of  the 
game,  and  facilitates  a  necessary  and  sufficient  condition  for  a  saddle  point,  which  we  present 
in  Theorem  2. 

Theorem  1  Consider  the  two-person  zero-sum  game  defined  by  pure  strategy  sets  IC  in 
(1),  II ^  in  (2),  and  player  A’s  payoff  function  f(x,y)  in  (4).  This  game  has  at  least  one 
saddle  point.  In  particular,  letting  v*  denote  the  value  of  the  game,  x'  G  II  a  a  pure  strategy 
that  maximizes 

J2  Aj(x)  ■  (bj  -v*),  (5) 

j&Ss 

and  y'  G  II #  a  pure  strategy  that  minimizes 

5^0i(j/)-(a*-,y*),  (6) 

i^Sa 

then  f(x',y')  =  v* ,  and  {x' ,y')  is  a  saddle  point. 

Proof.  We  prove  the  theorem  by  contradiction.  First,  suppose  that  x'  maximizes  (5)  and  y' 
minimizes  (6),  but  f{x',y' )  >  v* ,  or  equivalently, 

°  <  Aj<y)0j  - v *)  +  - v *)•  (7) 
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(8) 


Because  y'  minimizes  (6),  it  follows  that 

E  -  v*)  <  E  0*(y)(a*  -  «*),  Vy  G  ns. 

Adding  Aj(x')(bj  —  v*)  to  both  sides  of  the  preceding,  together  with  (7),  we  can 

conclude  that 

0  <  E  Aj(*,)(6i  _  u*)  +  E  e*(^)(a*  _  u*)>  Vy  G  nB, 

ieSs 

or  equivalently, 

f(x',y)>v*,  Vy  G  nB. 

In  other  words,  using  the  pure  strategy  x\  player  A  can  guarantee  a  payoff  strictly  greater 
than  v*,  showing  that  the  value  of  the  game  is  strictly  greater  than  v*,  which  is  a  contradiction 
that  v*  is  the  value  of  the  game. 

Second,  by  supposing  that  /( x',  y')  <  v*,  we  can  draw  a  similar  contradiction.  Therefore, 
we  have  shown  that  f(x',  y’)  =  v* . 

To  prove  that  (x',  y')  is  a  saddle  point,  we  need  to  show  that  /( x',  y)  >  v*  for  all  y  6  11^, 
and  f(x,  y')  <  v*  for  all  x  e  II^.  To  do  so,  note  that 

o  =  E  AAx')(bj  - v*)  +  E  - v *)> 

j&SB  i£SA 

<  E  aj(*,)(6j  -  v *)  +  E  -  v*)> 

j&Ss  i&SA 

where  the  equality  follows  from  f(x',  y')  =  v*,  and  the  inequality  from  adding  E'ess  A j(x')(bj  — 
v*)  to  (8).  Hence,  f(x',  y)  >  v*  for  all  y  G  nB.  A  similar  argument  shows  that  f(x,  y')  <  v* 
for  all  x  G  n^.  Consequently,  (x1,  y')  is  a  saddle  point.  □ 

If  we  know  the  value  of  the  game  v*,  then,  according  to  Theorem  1,  the  optimal  strategy 
for  A  is  ay  which  maximizes 

E  Ai(*)  '  (6i  -  v *)  =  E  E  XiiXii  '  ~  v*">  =  E  (  E  xv  ’  -  v *)  J  ,  (9) 

jeSB  j€SBi&sA  iesA  \jesB  ) 

where  A j(x)  is  dehned  in  (3).  Once  v*  is  known,  each  of  A’s  units  can  determine  which 
opposing  unit  to  fire  at  separately.  For  A’s  unit  i,  he  should  simply  compare  A ij(bj  —  v*)  for 
all  j  G  Sb  and  hnd  the  largest  value.  In  other  words,  it  is  optimal  for  A’s  unit  i  to  fire  at 
ZTs  unit  j*,  where 

j*  =  argmax  \j(bj  —  v*).  (10) 

In  case  of  a  tie,  break  it  arbitrarily,  in  which  case  there  will  be  multiple  optimal  pure  strategies 
and  multiple  saddle  points.  The  optimal  policy  is  to  set  Xij*  =  1,  and  =  0  for  j  ^  j*. 

It  follows  immediately  from  (10)  that,  if  Xtlj  =  \2j  for  all  j,  then  there  exists  an  optimal 
strategy,  with  which  A’s  units  i\  and  i2  fire  at  the  same  target.  This  result  strengthens 
Corollary  2  in  Kikuta  (1986),  which  requires  =  0y2  for  all  j. 
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Theorem  2  A  pair  of  pure  strategies  (x' ,  y')  is  a  saddle  point,  and  a  real  number  v'  is  the 
value  of  the  game,  if  and  only  if  all  three  conditions  hold: 

Cl.  x'  maximizes  Y2j£SB  Ai(x^  '  (bj  ~  v')i 

C2.  y'  minimizes  Y,teSA  ©*(?/)  '  (a*  “  v’)-> 

C3.  f(x',y')=v'. 

Proof.  From  Theorem  1,  the  game  has  at  least  one  saddle  point.  Denote  by  (x*,  y*)  a  saddle 
point,  and  v*  the  value  of  the  game.  It  follows  immediately  that  f(x*,y*)  =  v*. 

To  prove  that  C1-C3  are  sufficient  conditions,  we  need  to  show  v1  =  v*.  To  prove  v1  =  v* 
by  contradiction,  first  suppose  that  v’  <  v*  to  get  a  string  of  inequalities  involving  x’  and 
x*  as 

E  AAx')(bj  -  v> )  ^  E  Ai(x*)(b3  -  v>)  >  E  Ai(x*)(bJ  ~v*)> 

j£SB  j£SB  j£SB 

where  the  first  inequality  follows  from  Cl,  while  the  second  inequality  follows  because  of  the 
assumption  v1  <  v*.  Similarly,  we  get  another  string  of  inequality  involving  y'  and  y*  as 

E  -  V')  >  E  -  V '*)  -  E  -  V*)’ 

zGSa  i&SA 

where  the  first  inequality  follows  because  of  the  assumption  v'  <  v*,  while  the  second  in¬ 
equality  follows  from  Theorem  1.  Adding  these  two  equations  together,  we  arrive  at 

E  Aj(x')(bi-V')+  E  6*< {y')(ai~v')  >  E  Ai(x*)(bj~v*)  +  E  ei(V*)(ai~v*)-  (X1) 

j£SB  ieSA  j£SB  i£SA 

The  left-hand  side  of  the  preceding  is  0  according  to  C3,  while  the  right-hand  side  is  also  0 
since  f(x*,y*)  =  v*.  Hence,  we  arrive  at  a  contradiction. 

If  we  suppose  v1  >  v*  instead,  then  we  can  use  a  similar  argument  to  draw  a  contradiction. 
Consequently,  we  have  shown  that  v'  =  v*.  Finally,  using  Theorem  1,  together  with  v'  =  v* , 
Cl,  and  C2,  it  follows  that  (xr,  y')  is  a  saddle  point.  Therefore,  we  have  proved  that  C1-C3 
are  sufficient  conditions. 

We  next  prove  that  C1-C3  are  necessary  conditions.  To  prove  C3,  we  write 

}{x',y')  =  V*  =  v', 

where  the  first  equality  follows  because  (xr,  y')  is  a  saddle  point,  and  the  second  follows 
because  v'  is  the  value  of  the  game. 

To  prove  Cl  and  C2,  note  that  because  (xr,  y')  is  a  saddle  point,  f(x',  y')  must  be  the 
smallest  in  its  row  and  largest  in  its  column.  The  former  implies  that 

/(*',  y)  <  f(x' ,  y),  Vy  e  n  B, 
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with  equality  when  y  =  y' .  Use  C3  to  replace  the  left-hand  side  with  v',  and  use  (4)  to  spell 
out  the  right-hand  side.  After  some  algebra,  the  preceding  equation  becomes 

Y  a j(x')  •  (. bj  -  v')  +  Y  ®i(y)  ■  ( ai  ~  v')  >  o,  vy  g  nB, 

j&Ss  i&Sa 

with  equality  when  y  =  y' .  In  other  words,  y'  minimizes  Y2iesA  @i(y) '  (a*  — V),  which  proves 
C2.  Beginning  with 

f{x',  y')  >  f{x ,  y'),  \/x  E  nA, 

with  equality  when  x  =  xf  we  can  use  a  similar  argument  to  prove  Cl.  Consequently,  we 
have  proved  that  C1-C3  are  necessary  conditions.  □ 

2.3  Computing  Saddle  Points 

This  section  presents  an  iterative  algorithm  to  compute  saddle  points  without  enumerating 
the  entire  payoff  matrix  of  size  nm  x  mn.  The  algorithm  goes  as  follows. 

1.  Pick  v  arbitrarily  in  [0, 1]. 

2.  For  v  E  [0, 1],  define 


x(v)  =  argmax  Y^  A j(x)  ■  [bj  —  v ), 
j&Ss 

y(v )  =  arg min  V"  0j(y)  •  (a*  -  v). 

y  L ^ 


In  case  of  a  tie,  break  it  arbitrarily.  Next,  compute 


T(v)  =  f{x(v),y{v)) 


Ej&Sb  Ajixjv))bj  +  Jfi&sA  ®iiyiv))cn 
S jesB  A?  (*(«))  +  Ei €Sa  ©.(&(«)) 


(12) 

(13) 


(14) 


3.  If  T(v)  =  v ,  then  v  is  the  value  of  the  game  and  (x(v),y(v))  is  a  saddle  point.  If 
T(v)  v,  then  update  v  E-  T(v),  and  go  to  step  2. 

It  is  worth  noting  that  computing  x(v)  and  y(v)  in  (12)  and  (13)  does  not  require  linear 
programming,  and  can  be  done  quickly,  as  is  the  case  in  (10).  When  the  algorithm  stops,  we 
have  a  triplet  {x(v),y(v),v)  that  satisfies  the  three  conditions  in  Theorem  2;  therefore,  the 
optimal  solution.  It  follows  immediately  from  Theorem  2  that  the  value  of  the  game  v*  is 
a  fixed  point  of  the  function  T(-),  namely  T(v*)  =  v*.  We  next  present  two  lemmas,  before 
proving  that  the  algorithm  will  stop  after  a  finite  number  of  iterations.  Although  Lemma  1 
can  be  viewed  as  a  special  case  of  Lemma  2,  we  put  them  separately  for  ease  of  explanation. 

Lemma  1  If  v  <  v*,  then  T(v)  >  v,  if  v  >  v* ,  then  T(v)  <  v. 
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Proof.  From  Theorem  1  there  exists  a  saddle  point;  let  (x*,y*)  denote  one.  If  v  <  v*,  then 
using  the  same  argument  that  gives  rise  to  (11),  we  have  that 

A i(*(u))(6j  -  v)  +  ®i{y(v))(ai  -  v)  >  A j(x*)(bi  -  v*)  +  0*(^*)(a*  “  v*)- 

j£SB  i&SA  j£SB  1&SA 

The  right-hand  side  of  the  preceding  is  equal  to  0,  because  f(x*,y*)  =  v*.  Therefore, 

E jesB  A j(x(v))bj  +  EiesA  Oi(y(v))ai 

or  equivalently,  T(y)  >  v.  If  v  >  v*,  then,  using  a  similar  argument,  we  can  show  that 
T(v)  <  v,  which  completes  the  proof.  □ 

Let  TA)( v)  =  v ,  and  for  k  =  1,2,...,  let  T^k\v)  =  T  o  T^k~l\v).  The  next  lemma 
generalizes  Lemma  1. 

Lemma  2  If  v  <  v*,  then  T^(u)  >  v,  for  k  —  1,  2, . . if  v  >  v*,  then  T^k\ v)  <  v,  for 

k  =  1,2,.... 

Proof.  Consider  the  case  v  <  v*,  and  for  notational  simplicity  write  Vq  =  v  <  v*,  and 

Vk  =  for  k  =  1,2,....  We  need  to  show  that  ry  >  vq  for  k  —  1,  2, _ 

Because  Vo  <  u*,  it  follows  from  Lemma  1  that  v\  >  no-  If  v\  <  v* ,  then  it  follows  from 
Lemma  1  again  that  iq  >  v\  >  vq.  In  other  words,  the  sequence  rq,  ui, . . .  increases  strictly 
until,  at  some  point,  it  either  reaches  v*  or  exceeds  v*.  In  the  former  case,  all  following 
numbers  in  the  sequence  are  v*  because  T(v*)  =  v*,  so  it  is  true  that  Vk  >  v0  for  k  —  1, 2, . . .. 
Suppose  now  that  the  sequence  v0,vi, . . .  exceeds  v*  at  some  point.  Let 

s  =  min{h  :  >  v*}. 

In  other  words,  Vo  <  v\  <  •  ■  ■  <  vs-\  <  v*  <  vs,  as  depicted  in  Figure  1.  Because  vs  >  v*, 
it  follows  again  from  Lemma  1  that  <  vs.  If  Vk  >  v*  for  k  =  s,s  +  1, . . .,  then  the 
statement  that  ry  >  v0  for  all  k  —  1,  2, ...  is  also  true. 

v* 


V0  Vi  ...  Us-i  Vt  Vt- 1  . . .  n.s+i  Vs 


Figure  1:  This  diagram  depicts  the  sequence  v0,  vlf . . . ,  vs, . . . ,  vt, . . .,  where  ry  =  ( v0 ),  for 

k  —  0, 1, . . ..  Each  new  number  in  the  sequence  either  gives  a  better  lower  bound  or  a  better 
upper  bound  for  v*,  and  the  sequence  converges  to  v*  after  a  finite  number  of  iterations. 


To  complete  the  proof,  suppose  now  that  the  sequence  vs,  us+i, . . .  drops  below  v*  at  some 
point,  and  let 

t  =  min{h  :  k  >  s,Vk  <  v*}. 


In  other  words,  vt  <  v*  <  vt-\  <  •  •  •  <  vs ;  as  depicted  in  Figure  1.  We  next  show  that 
vt  >  Vg-i.  To  do  so,  write  a  string  of  inequality 

Y  Ai(*(«t-i))(6i  -  va-i)  >  Y  Ai(*(ut-i))(6j  -  Ut-i) 

ieSs  j£SB 

>  Y  Aj(*(us_i))(6i  -ut_i) 
is  Sb 

>  X]  Aj(®(u«-i))(6j  -^), 

j&SB 

where  the  hrst  inequality  follows  because  us_i  <  v*  <  ut_i;  the  second  inequality  follows 
from  the  definition  of  x(vt- 1);  the  third  inequality  follows  because  vt- 1  <  us.  Similarly,  write 
another  string  of  inequality 

Y  ®i(y(vt-l))(ai  -  vs-l)  >  ^  @i(y(vs-l))(ai  -  fs-l) 

ie^A  *sSa 

>  Y  ®i(y(vs-i))(ai  -  vs), 
i£SA 

where  the  first  inequality  follows  from  the  definition  of  y( vs-i),  and  the  second  inequality 
follows  because  us_i  <  v*  <  vs.  Adding  these  two  inequalities  gives 

Y  Aj(*(ut-l))(6j  -  us-l)  +  Y  ®i(y(Vt-l))(ai  -  Vs-1 ) 

j£SB  *SSa 

>  Y  Aj(*(us_i))(6j  -  Va)  +  Y  ®i(y(v8- l))(fli  -  *>«)■ 

jsSs  *6Sa 

The  right-hand  side  of  the  preceding  is  equal  to  0,  because  vs  =  T( us_i).  Hence,  we  arrive 
at 

Y  Aj(&(ut- i))(bj  -  vs-i)  +  Y  ®i(y(vt-i))(ai  ~  vs-i )  >  0, 

j&SB  i&SA 

or  equivalently, 

E j  A jixivt-^bj  +  Et  <di{y(vt-i))a.i 

E,  +  Ei  &i(v(vt-i))  >  ”‘-1' 

Because  the  left-hand  side  of  the  preceding  is  just  T{vt~ i)  =  vt,  we  conclude  that  vt  >  vs-\  > 
Vo .  By  repeating  this  argument,  we  can  see  that  Vk  >  Vo,  for  all  k  =  1,2, . . ..  The  case  of 
Vq  >  v*  can  be  proved  in  a  similar  fashion.  □ 


Theorem  3  The  algorithm  will  stop  after  a  finite  number  of  iterations. 

Proof.  Recall  that,  in  the  game  matrix,  A  has  nm  pure  strategies  (rows)  and  B  has  mn  pure 
strategies  (columns).  There  are,  at  most,  nm  x  mn  distinct  payoff  values  in  the  game  matrix, 
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each  of  which  corresponds  to  f(x,  y)  for  some  pure  strategy  pair  (x,  y).  In  addition,  at  least 
one  of  the  payoff  values  is  v*,  since  the  game  has  a  saddle  point. 

Again  for  notational  simplicity,  write  vq  =  v,  and  vy-  =  T^k\v),  for  k  —  1,  2, . . ..  To  prove 
the  theorem,  suppose  instead  that  the  algorithm  does  not  stop,  or  equivalently,  Vk  A  v*  for  all 
k  —  0, 1,  2, . . ..  Because  v*.  it  follows  from  Lemma  2  that  its  value  will  not  be  repeated 
in  the  subsequence  V/-+1,  Vk+2,  ■  ■  ••  In  other  words,  all  numbers  in  the  sequence  v0,  Vi,  v2,  ■  ■  ■ 
are  distinct.  Other  than  u0,  however,  each  number  in  the  sequence  V\ ,  v2, . . .  corresponds  to 
a  payoff  value  in  the  game  matrix.  We  then  arrive  at  a  contradiction  because  there  are  only 
a  finite  number  of  distinct  payoff  values  in  the  game  matrix,  which  completes  the  proof.  □ 

The  proof  in  Theorem  3  shows  that  the  algorithm  will  stop  after  at  most  nm  x  mn 
iterations.  This  worst  case  would  happen  if  all  the  payoff  values  in  the  game  matrix  are 
distinct,  and  if  the  sequence  T^k\v),  k  =  1,2, . . .  visits  all  these  distinct  values.  In  practice, 
the  actual  number  of  iterations  required  to  compute  v*  is  often  far  smaller  than  nm  x  mn, 
because  the  sequence  gets  closer  to  v*  after  each  iteration.  In  particular,  as  seen  from  the 
proof  in  Lemma  2,  each  new  value  generated  in  the  sequence  is  either  v* ,  or  the  best  lower 
bound  to  date  if  it  is  less  than  v*,  or  the  best  upper  bound  to  date  if  it  is  larger  than  v*. 

One  way  to  speed  up  the  computation  is  to  pick  the  initial  value  close  to  v*  to  reduce 
the  number  of  iterations.  To  this  end,  note  that  maXjegA  a,  <  v*  <  min jesBbj,  because  A 
will  increase  his  win  probability  if  he  kills  any  of  BA  units,  and  decrease  his  win  probability 
if  any  of  his  units  are  killed.  Hence,  an  initial  pick  between  max*egA  a*  and  minjess  bj,  such 
as 

If 

v  —  -  max  a.;  +  mm  by 
2  \ieSA  jeSB 

should  work  well. 

This  algorithm  can  be  used  to  recursively  compute  V(Sa,  Sb),  the  value  of  the  game  in 
state  (SA,  Sb)-  Specifically,  we  need  to  compute  V(SA,  S'B )  for  all  S'A  C  Sa  and  S'B  C  Sb,  by 
iterating  on  |SAI  +  \S'b\,  the  total  number  of  units  still  alive.  The  case  when  \SA\  +  |SAI  =  1 
is  trivial.  If  we  have  computed  V(S'a,S'b)  for  all  states  when  |SAI  +  \S'b\  —  k,  then  those 
values  become  the  a*  and  bj  used  to  compute  v*  for  states  ( S'A ,  SB )  with  +  |5^|  =  k  +  1, 
which,  in  turn,  becomes  the  a*  and  bj  for  next  iteration  when  |SA|  +  \SB\  =  k  +  2. 

Example  1  Consider  an  example  with  three  unit  types:  rock  (R),  paper  (P),  and  scissors 
(S).  Assume  that 


Ar,p  =  Ap;s  =  As,r  =  0.5, 

Ar,r  =  Aptp  =  As,s  =  1, 

Ar,s  =  As,p  =  Ap3  =  2, 

and  6ij  =  A ij  for  i,  j  =  R,  P,  S.  Table  1  gives  the  probability  that  A  wins  the  duel  in  various 
states.  For  instance,  if  A  has  2  rocks  and  B  has  1  rock  and  1  paper,  then  A  will  win  the 
duel  with  probability  R(RR,  RP)  =  0.262. 

There  is  one  interesting  observation.  Whereas,  in  1-on-l  and  2-on-2  duels,  the  win 
probability  depends  highly  on  the  unit  types  on  each  side,  in  a  3-on-3  duel  having  RPS 
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Tabic  1:  Probability  that  Player  A  wins  the  duel  in  different  states  as  discussed  in  Example  1, 
when  there  are  three  unit  types:  Rock,  Paper,  and  Scissors. 


Player  B 


Player  A 

PP 

RP 

RS 

PS 

RPS 

RPP 

RSS 

PPS 

PSS 

R 

0.022 

0.071 

0.320 

0.133 

0.040 

0.007 

0.178 

0.013 

0.076 

RR 

0.111 

0.262 

0.696 

0.375 

0.186 

0.049 

0.533 

0.079 

0.279 

RP 

0.304 

0.500 

0.623 

0.377 

0.228 

0.186 

0.358 

0.121 

0.189 

RRR 

0.270 

0.494 

0.906 

0.609 

0.402 

0.152 

0.813 

0.211 

0.514 

RRP 

0.467 

0.673 

0.879 

0.642 

0.463 

0.325 

0.714 

0.286 

0.453 

RRS 

0.721 

0.618 

0.814 

0.812 

0.474 

0.353 

0.675 

0.547 

0.647 

RPS 

0.814 

0.772 

0.772 

0.772 

0.500 

0.526 

0.537 

0.537 

0.526 

would  guarantee  a  win  probability  at  least  0.5,  regardless  of  the  opponent’s  three  units.  It 
is  better  to  have  a  balanced  force,  which  makes  it  difficult  for  the  opponent  to  exploit  the 
weakness.  □ 


3  One  Against  Many 

Consider  the  special  case  when  m  —  1.  The  optimal  strategy  for  B  is  clearly  for  all  his 
remaining  units  to  fire  at  A’s  only  unit,  while  A  needs  to  decide  in  which  of  the  n\  possible 
orders  his  only  unit  should  fire  at  -B’s  units.  Because  A  has  only  one  unit,  in  this  section  we 
write  \j  =  Xij,  and  9j  =  9ji  for  notational  convenience.  We  also  use  target  j  and  £>’ s  unit  j 
interchangeably. 

The  problem  has  been  previously  studied  by  Friedman  (1977)  and  Kikuta  (1983),  where 
they  identified  necessary  conditions  and  sufficient  conditions  for  an  optimal  fire  order.  In 
particular,  Friedman  (1977)  showed  that  if  the  fire  order  1,2, ...  ,n  is  optimal,  then 


A k  +  9k  +  /  y  9i 

i=k+ 2 


>  9k+ 1  (  +  9k+ 1  +  9i 

i=k+ 2 


for  k  —  1,  2, . . . ,  n  —  1,  (15) 


because  otherwise,  swapping  k  and  k  +  1  results  in  a  better  fire  order.  Equation  (15), 
however,  is  not  a  sufficient  condition  for  optimality,  as  seen  by  a  counterexample  given  in 
Kikuta  (1983).  There  is  no  simple  way  to  rank  the  targets  in  a  complete  list,  because  whether 
target  k  or  target  k  +  1  should  be  bred  at  first  depends  on  the  other  targets  present,  as  seen 
by  the  term  S”=fc+2  in  (15). 

Intuitively,  A  prefers  to  fire  at  a  target  that  is  easier  to  kill  so  as  to  eliminate  a  target 
sooner.  He  should  also  prefer  to  kill  a  target  that  poses  a  bigger  threat.  That  is,  if  Ai  >  A2  and 
9 1  >  9-2,  then  it  is  intuitive  that  A  should  kill  target  1  before  trying  to  kill  target  2,  regardless 
of  the  other  targets  still  alive.  It  turns  out  this  conjecture  is  true.  The  next  theorem  presents 
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a  slightly  weaker  condition  than  the  preceding  one,  under  which  it  is  possible  to  rank  the 
preference  between  two  targets,  regardless  of  the  other  targets  still  alive. 

Theorem  4  If  either 

1.  6\  >  9-2  and  Ai  +  9\  >  X2  +  02,  or 

2.  9\  =  62  and  Ai  >  A2, 

then  target  1  stands  higher  than  target  2  in  the  optimal  fire  order,  regardless  of  the  other 
targets. 

Proof.  Consider  fire  order  1: 


■  ■  • ,  1,  *1, ■  ■■ •-jdfe,  2,  ji, . . . ,  ji ■ 

Let  a  =  ^}s=i  9is  and  (3  =  Y2ls=i  @js  f°r  notational  convenience.  The  probability  that  A  wins 
with  fire  order  1  is 


P{wipe  out  all  targets  in  front  of  target 
x  - - .  Al  . - Jfl  x 


T  UCiUiC  gjCL  Llllg,  IVliiCUJ 

Ao 


Ai  +  Oi  +  a  +  d2  +  P  \is  +  Ylr=s  Oir  +  62  +  f3 )  A2  +  02  +  f3 
x  P{wipe  out  targets  j  1, . . . ,  ji  before  getting  killed}. 

Swap  targets  1  and  2  in  fire  order  1  to  get  fire  order  2: 


(16) 


,  2,  ii, ... ,  ifc,  1,  ji,  •  •  •  ,ji- 

The  probability  that  A  wins  with  fire  order  2  is 

P{wipe  out  all  targets  in  front  of  target  2  before  getting  killed} 

- }  — L — 

rJr91+f3j  Ai  +  61  +  (3 

x  P{wipe  out  targets  j  1, . . . ,  ji  before  getting  killed}.  (17) 

The  first  term  and  the  last  term  in  equations  (16)  and  (17)  are  identical.  In  addition,  the 
product  term  in  the  middle  in  (16)  is  at  least  as  large  as  the  product  term  in  the  middle  in 
(17),  because  of  the  assumption  6\  >  62  in  the  theorem.  Hence,  fire  order  1  is  strictly  better 
than  fire  order  2  if 

Ai  A2  A2  Ai 

\i  +  61  +  a  +  62  +  (3  \2  +  62  +  (3  A2  +  9\  +  ol  +  62  +  (3  X\  +  9\  +  (3 

If  either  condition  stated  in  the  theorem  is  met,  then  one  can  verify  that 

(Ai  +  61  +  a  +  02  +  (3){ A2  +  02  +  (3)  -  (A2  +  9i  +  a  +  02  +  P)(\i  +  9\  +  f3) 

=  02^2  +  $2)  +  (A2  +  02)a  +  02 f3  ~  #i(Ai  +  9\)  —  (Ai  +  9\)a  —  9\(3  <  0, 


x 


n 


a2  +  #!  +  a  +  e2  +  p  l  Ai  A..  +  Eb.  0 
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which  completes  the  proof. 


□ 


A  natural  question  to  ask  is  whether  Theorem  4  can  be  extended  to  the  case  when  A  has 
m  >  2  units.  In  other  words,  if  each  of  the  m  units  can  individually  rank  all  of  B's  units 
according  to  the  condition  in  Theorem  4,  then  does  each  unit’s  optimal  fire  order  collectively 
give  rise  to  the  group  optimal  policy?  It  turns  out  that  is  not  the  case,  as  seen  in  the 
counterexample  below. 


Example  2  Consider  an  example  with  m  —  2  and  n  =  2,  with  their  kill-rate  matrices  given 
as  follows: 


i^ij]  ~ 


0.9  1 

1  0.9 


°ji.  = 


1.1  1 
1  1.1 


In  state  ({1},  {1,2}),  from  the  standpoint  of  A’s  unit  1,  the  rates 

An  =  0.9,  A12  =  1,  On  =  1-1,  6*21  =  1 


meet  the  condition  in  Theorem  4.  Therefore,  in  state  ({1},  {1,2})  it  is  optimal  for  A’s  unit 
1  to  fire  at  B's  unit  1.  For  the  same  reason,  in  state  ({2},  {1,2})  it  is  optimal  for  A’s  unit  2 
to  fire  at  B's  unit  2. 

If  A  still  has  both  units  and  B  also  has  both  units,  however,  then  A’s  pure  strategy 


is  not  optimal,  as  another  pure  strategy 


can  do  strictly  better,  because 

Ai(aj')  =  1  >  0.9  =  Ai(aj), 

A2(*')  =  1  >  0.9  =  A2(*). 

As  a  matter  of  fact,  x'  is  optimal  for  A  in  state  ({1, 2},  {1,  2}).  Hence,  even  if  each  unit  can 
individually  rank  all  opposing  units  according  to  the  condition  in  Theorem  4,  collectively, 
these  individual  rankings  do  not  necessarily  give  rise  to  the  group  optimal  policy.  □ 


To  conclude  this  section,  we  give  a  condition  weaker  than  the  one  in  Theorem  4,  under 
which  the  fire  order  1,  2, . . . ,  n  is  optimal.  Theorem  2  in  Kikuta  (1983)  also  gives  a  sufficient 
condition  for  the  fire  order  1,2,..., n  to  be  optimal.  It  is  straightforward  to  show  that 
Kikuta’s  condition  implies  the  one  in  Corollary  1;  however,  the  condition  in  Corollary  1  is 
much  easier  to  verify. 
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Corollary  1  If 


Oi  >  e2  >  ■  ■  ■  >  en, 


and 

#i(Ai  +  Of)  >  $2(^2  +  O2)  >  •  •  •  >  0n(Xn  +  9n), 
then  the  fire  order  1,2,...,  n  is  optimal. 

Proof.  Consider  an  arbitrary  hre  order  . . . , i,  j, . . .  other  than  1,2 , ,n,  where  i  >  j.  For 
any  constant  D  >  0,  we  can  verify  that 


0i(Ai  +  Oi  +  D )  <  "F  0j  +  D ). 


Hence,  according  to  (15),  swapping  i  and  j  results  in  another  hre  order  that  is  at  least  as 
good. 

Starting  with  an  arbitrary  hre  order,  we  can  repeatedly  look  for  adjacent  targets  that  are 
out  of  order  and  swap  them — analogous  to  bubble  sort — so  that  in  each  step  we  get  a  new 
hre  order  that  is  at  least  as  good.  When  no  such  swapping  is  possible,  we  arrive  at  the  hre 
order  1,2, ... ,  n,  which  is  at  least  as  good  as  the  initial  hre  order.  Because  this  argument 
works  for  any  initial  hre  order,  the  hre  order  1,  2, . . . ,  n  is  optimal.  □ 
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